The invention relates to a method for automated language analysis based on word-selection and to
a language analysis apparatus.
Psycholinguistics or the psychology of language is concerned with the processes of language acquisition, language comprehension (knowledge of the language) and language production:                Language acquisition: how can children pick up and apply linguistic knowledge?        Language comprehension: what means and what knowledge are used to grasp and establish the meanings and sense of words, sentences and texts?        Language production: how can language be articulated in a way that conveys meaning?        
Language psychologists distinguish four successive stages in these processes:    (1) Conceptualization of an idea, thought or feeling to be expressed    (2) Creation of a linguistic procedure    (3) Articulation as the implementation of the procedure    (4) Monitoring of the articulation
These processes are concerned with the cognitive, thought-based and knowledge-based aspects of language. Yet the question of how personality is expressed in language was for many years of almost no interest to psychologists, with the exception of Freud, whose theses still have no empirical basis, however.
In addition to context-based qualitative analysis models, some of which were extremely time-consuming and resource-intensive and which were geared towards relatively large language units such as sentences and entire texts, there emerged in the 1960s and 1970s with the advent of the computer, automated analysis techniques known as “word-count methods”, which used an objective, quantitative approach and were based on small language units.
U.S. Pat. No. 5,696,981 B1 discloses a method and a language analysis apparatus for automated personality analysis based on word selection. A program stored on a personal computer contains lists of keywords, each list comprising a multiplicity of words associated with one of six personality types. For the purpose of the personality analysis, a language file is recorded which comprises a multiplicity of words of a person who is meant to be analyzed. The words may originate, for example, from a questionnaire, a monologue or a text written by the person. They are entered in the computer system as a text, or for a monologue converted automatically into a language file by voice recognition. As part of the automated personality analysis, the language file is examined to determine whether it contains keywords from the six lists of keywords. This involves a multistage process in which the keywords identified in the language file are weighted and associated with one of the six personality types according to context on the basis of the keywords contained in the six lists. For the output of the result of the personality analysis, the weighted keywords are summated for each of the six personality types, and the association of the personality to the six personality types is made as a percentage, in particular represented in a bar chart.
A disadvantage of the known method is that only one automated personality analysis based on word selection is possible. Other characteristics of the person from whom the language file originates cannot be determined using this method. The short keyword lists only overlap with a very small part of the recorded language files. Hence the personality analysis looks at only a small part of the language file. In addition, the keyword lists and the analysis based on said lists are static, and hence the quality of the personality analysis is highly dependent on the choice of keywords.